Compressing Integers for Fast File Access
نویسندگان
چکیده
Fast access to files of integers is crucial for the efficient resolution of queries to databases. Integers are the basis of indexes used to resolve queries, for example, in large internet search systems and numeric data forms a large part of most databases. Disk access costs can be reduced by compression, if the cost of retrieving a compressed representation from disk and the CPU cost of decoding such a representation is less than that of retrieving uncompressed data. In this paper we show experimentally that, for large or small collections, storing integers in a compressed format reduces the time required for either sequential stream access or random access. We compare different approaches to compressing integers, including the Elias gamma and delta codes, Golomb coding, and a variable-byte integer scheme. As a conclusion, we recommend that, for fast access to integers, files be stored compressed.
منابع مشابه
Storage system for document imaging applications
Document images can be stored compactly using JBIG. The main drawback of the method is the lack of direct access to the compressed image file (spatial access). Here we propose a storage system based on JBIG so that spatial access is also supported. For facsimile images, the increase in the file size is only about 10 %. With this cost we achieve also a fast preview to the image using only about ...
متن کاملFQbin a compatible and optimized format for storing and managing sequence data
Existing hardware environments may be stressed when storing and processing the enormous amount of data generated by nextgeneration sequencing technology. Here, we propose FQbin, a novel and versatile tool in C for compressing, storing and reading such sequencing data in a new and Fasta/FastQ-compatible format that outperforms the existing proposals. It is based on the general-purpose zLib libra...
متن کاملافروزش سریع- شوکی رهیافتی نوین برای همجوشی محصور سازی اینرسی
A new concept for inertial confinement fusion called fast-shock ignition (FSI) is introduced as a credible scheme in order to obtain high target gain. In the proposed model, the separation of fuel ignition into two successive steps, under the suitable conditions, reduces required ignitor energy for the fuel ignition. The main procedure in FSI concept is compressing the fuel up to stagnation. T...
متن کاملUsing Partitions and Superstrings for Lossless Compression of Pattern Databases
We present an algorithm for compressing pattern databases (PDBs) and a method for fast random access of these compressed PDBs. We demonstrate the effectiveness of our technique by compressing two 6-tile sliding-tile PDBs by a factor of 12 and a 7-tile sliding-tile PDB by a factor of 24.
متن کاملExploring compression techniques for ROOT IO
ROOT provides an flexible format used throughout the HEP community. The number of use cases from an archival data format to end-stage analysis has required a number of tradeoffs to be exposed to the user. For example, a high “compression level” in the traditional DEFLATE algorithm will result in a smaller file (saving disk space) at the cost of slower decompression (costing CPU time when read)....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Comput. J.
دوره 42 شماره
صفحات -
تاریخ انتشار 1999